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Abstract 



The goal of cross-domain object matching 
(CDOM) is to find correspondence between 
two sets of objects in different domains in 
an unsupervised way. Photo album summa- 
rization is a typical application of CDOM, 
where photos are automatically aligned into 
a designed frame expressed in the Cartesian 
coordinate system. CDOM is usually for- 
mulated as finding a mapping from objects 
in one domain (photos) to objects in the 
other domain (frame) so that the pairwise 
dependency is maximized. A state-of-the-art 
CDOM method employs a kernel-based de- 
pendency measure, but it has a drawback 
that the kernel parameter needs to be de- 
termined manually. In this paper, we pro- 
pose alternative CDOM methods that can 
naturally address the model selection prob- 
lem. Through experiments on image match- 
ing, unpaired voice conversion, and photo al- 
bum summarization tasks, the effectiveness 
of the proposed methods is demonstrated. 



1 Introduction 

The objective of cross-domain object matching 
(CDOM) is to match two sets of objects in different 
domains. For instance, in photo album summariza- 
tion, photos are automatically assigned into a designed 
frame expressed in the Cartesian coordinate system. 
A typical approach of CDOM is to find a mapping 
from objects in one domain (photos) to objects in the 
other domain (frame) so that the pairwise dependency 
is maximized. In this scenario, accurately evaluating 
the dependence between objects is a key challenge. 



Kernelized sorting (KS) ( Jebara . 2004 ) tries to find 
a mapping between two d omains that maximi z es the 
mutual information (MI) (|Cover and Thomas . 2006) 
under the Gaussian assumption. However, since the 
Gaussian assumption may not be fulfilled in practice, 
this method (which we refer to as KS-MI) tends to 
perform poorly. 

To overcome the limitation of KS-MI, 
Quadrianto et al. ( 2010l ) proposed using the kernel- 



based dependence measure calle d the Hilbert- Schmid t 



independence criterion (HSIC) (jCretton et all 120051 ) 



for KS. Since HSIC is distribution-free, KS with HSIC 
(which we refer to as KS-HSIC) is more flexible than 
KS-MI. However, HSIC includes a tuning parameter 
(more specifically, the Gaussian kernel width), and its 
choice is crucial to obtain better performance (see also 
Ijagarlamudi et aLl . l2010f ). Although using the median 
distance between sample points as the Gaussian 
kernel width is a common heuristic in kernel-based 
depen dence measures (see e.g., iFukumizu et al 



2009al ). this does not always perform well in practice. 



In this paper, we propose two alternative CDOM 
methods that can naturally address the model se- 
lection problem. The first method employs an- 
other kernel-based dependence measure based on 
the normalized cross-coy ariance operator (NOCCO) 



( Fukumizu et all 20093), which we refer to as KS- 



NOCCO. The NOCCO-based dependence measure 
was shown to be asymptotically independent of the 
choice of kernels. Thus, KS-NOCCO is expected to be 
less sensitive to the kernel parameter choice, which is 
an advantage over HSIC. 

The second met hod uses least - squar es mutual infor- 
mation (LSMI) ( Suzuki et al . 2009f) as the depen- 
dence measure, which is a consistent estimator of the 
squared-loss mutual information (SMI) achieving the 
optimal convergence rate. We call this method least- 
squares object matching (LSOM). An advantage of 
LSOM is that cross-validation (CV) with respect to the 
LSMI criterion is possible. Thus, all the tuning param- 
eters such as the Gaussian kernel width and the regu- 
larization parameter can be objectively determined by 
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cv. 

Through experiments on image matching, unpaired 
voice conversion, and photo album summarization 
tasks, LSOM is shown to be the most promising ap- 
proach to CDOM. 

2 Problem Formulation 

In this section, we formulate the problem of cross- 
domain object matching (CDOM). 

The goal of CDOM is, given two sets of samples of the 
same size, {xi}^^^ and {yi}"=i, to find a mapping that 
well "matches" them. 

Let TT be a permutation function over {1, . . . , n}, and 
let II be the corresponding permutation indicator ma- 
trix, i.e.. 



ne{o,ir 



ni. 



and n^l 



where 1„ is the n-dimensional vector with all ones and 
^ denotes the transpose. Let us denote the samples 
matched by a permutation tt by 

Z{U) :={(a;„y^(,))}r=i. 

The optimal permutation, denoted by 11*, can be ob- 
tained as the maximizer of the dependency between 
the two sets {a;;}"^^ and {yi}^^i- 

n* argmaxi:)(Z(n)), 
n 

where D is some dependence measure. 



Y , respectively. MI is zero if and only if X and Y are 
independent, and thus it may be used as a dependency 
measure. Let H{X), H{Y), and H{X,Y) be the en- 
tropies of X and Y and the joint entropy of X and F, 
respectively: 

H{X) = - j p{X)\ogp{X)dX, 

H{Y) = - j p{Y)\ogp{Y)AY, 

H{X,Y) = - j p{X,Y)\ogp[X,Y)dXdY, 

respectively. Then the mutual information between X 
and Y can be written as 

MI(Z) = H{X) + H{Y) - H{X, Y). 

Since H{X) and H{Y) are independent of permuta- 
tion n, maximizing mutual information is equivalent 
to minimizing the joint entropy H{X, Y). lip{X, Y) is 
Gaussian with covariance matrix S, the joint entropy 
is expressed as 



i/(x,y) = iiog|s| 



Const, 



where |S| denotes the determinant of matrix S. 

Now, let us assume that x and y are jointly normal 
in some reproducing Kernel Hilbert Spaces (RKHSs) 
endowed with joint kernel K{x,x')L{y,y'), where 
K{x, x') and L{y, y') are reproducing kernels for x 
and y, respectively. Then KS-MI is formulated as fol- 
lows: 



3 Existing Methods 

In this section, we review two existing methods for 
CDOM, and point out their weaknesses. 

3.1 Kernelized Sorting with Mutual 
Information 

Kernelized so rting with mutual information (KS-MI) 
( Jebarall2004l ) matches objects in different domains so 
that MI between matched pairs is maximized. Here, 
we revie w KS-MI following alternative derivation pro- 
vided in Quadrianto et a/.l ( 20101 ). 



MI is one of the popular dependence measures between 
random variables. Fo r random variables X and y, MI 
is defined as follows ( Cover and Thomasl[2006l) : 



MI(Z) 



p(x,y)iog^^^dXdy, 

p{X)p{Y) 



where p{X, Y) denotes the joint density of X and F, 
and p{X) and p{Y) are marginal densities of X and 



miniog|r(Ko (n' in))r| 



(1) 



where K = {K{x„Xj)}lj^-^ and L = {i(yj, yj)}"j=i 
are kernel matrices, o denotes the Hadamard product 
(a.k.a. the element- wise product), F = /„ — il„l^ 
is the centering matrix, and /„ is the n-dimensional 
identity matrix. 

A critical weakness of KS-MI is the Gaussian assump- 
tion, which may not be fulfilled in practice. 

3.2 Kernelized Sorting with Hilbert-Schmidt 
Independence Criterion 

Kernelized sorting with Hilbert-Schmidt independence 
criterion (KS-HSIC) matches objects in different do- 
mains so that HSIC between matched pairs is maxi- 
mized. 

HSIC is a kernel-based depe ndence measure given as 
follows (iGretton et a/.l . l2005l) : 



HSIC(Z) =tT{KL), 
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where K = TKT and L — TLT are the centered 
kernel matrices for x and y, respectively. Note that 
smaller HSIC scores mean that X and Y are closer to 
be independent. 



KS-H SIC is formulated as follows (jQuadrianto et all 



where 



maxHSIC(Z(n)), 



HSIC(Z(n)) = tT{KU^LU) 



(2) 



(3) 



This optimization problem is called the quad ratic as 



signment problem (QAP) (IFinke et al\ . 119871) . and it 
is known to be NP-hard. There exists several QAP 
solvers such as methods based on simulated annealing, 
tabu search, and genetic algorithms. However, those 
QAP solvers are not easy to use in practice since they 
contain various tuning parameters. 

Another approach to solving Eq.Jll) based on a lin- 
ear assig nment prqblern (LAP) (K uhn, 1955) was pro- 
posed in lOuadrianto et all |2010} ). which is explained 
below. Let us relax the permutation indicator matrix 
n to take real values: 



n e [0,1] 



and n ' Ir 



(4) 



The n, Eg. (El) is convex with respect to IT (see Lemma 
7 in Quadrianto et al . 201Clt ). and its lower bound can 
be obtained using some 11 as follows: 



tr{KULU) 

> tr(Kn^Zn) 



(n-n 



aHsic(z(n)), 
on 



= 2tr(i^n^in) - tr(Kn^i,n), 

where (•, •) denotes the inner product between 

matrices. Bas e d on the above lower bound, 

[Quadria nto et al. ( 20101 ) proposed to update the per- 
mutation matrix as 



jjncw ^ _ ^^jjow + ^argmaxtr (H ' I,n°'^K) , 

n 

(5) 

where < ?/ < 1 is a step size. The second term is an 
LAP subproblem, which can be efficiently solved by 
using the Hungarian method. 

In th e original KS-HSIC paper (^Qu adrianto et al . 
2OIOI) . a C++ implementation of the Hungarian 



method provided by Coopei0 was used for solving 
Eq.(l5]); then 11 is kept updated by Eq.([ni) until con- 
vergence. 



In this iterative optimization procedure, the choice of 
initial permutati on matrices i s criti cal to obtain a good 
solution. Quad rianto et al. (I2OIOI) proposed the fol- 
lowing initialization scheme. Suppose the kernel ma- 
trices K and L are rank one, i.e., for some / and g, K 



and L can be expressed as K 
Then HSIC can be written as 



// and X = fiffif ' 



HSIC(Z(n)) = \\fUg\ 



(6) 



The initial permutation matrix is determined so that 
Eq.dni) is maximized. Acc ording to Theorems 368 and 
369 in lHardv et al. (|l952h . the maximum of Eq.([6]) is 
attained when the elements of / and Hg are ordered in 
the same way. That is, if the elements of / are ordered 
in the ascending manner (i.e., /i < /2 < • • • < /„), 
the maximum of Eq.(|ni) is attained by ordering the 
elements of g in the same ascending way. However, 
since the kernel matrices K and L may not be rank 
one in practice, the principal eigenvectors of K and 
L w ere used as / a nd q in the original KS-HSIC pa- 
per ( Quadrianto et aLl . bolot ). We call this eigenvalue- 
based initialization. 

Since HSIC is a distribution-free dependence measure, 
KS-HSIC is more flexible than KS-MI. However, a crit- 
ical weakness of HSIC is th at its performance is sens i- 



2OIOI) . 



five to the choice of kernels (|Jagarlamudi et al. 
A practical heuristic is to use the Gaussian kernel with 
widt h set to the median dist ance between samples (see 



e.g.. lFukumizu et a/.l . l2009af l . but this does not always 



work well in practice. 

4 Proposed Methods 

In this section, we propose two alternative CDOM 
methods that can naturally address the model selec- 
tion problem. 

4.1 Kernelized Sorting with Normalized 
Cross-Covariance Operator 

The kernel-based dependence measure based on the 
normalized cross-covariance operator (NOCCO) 

is given as follows 



('Fukumizu et al. 


. 2009bl) 


(Fukumizu et al. 


. 2009bl): 



^http://mit.edu/harold/www/code.html 



Dnocco(^) = tr(ii:i:), 

where K = K{K + neJ„)-i, L = L{L + ne7„)"\ 
and e > is a regularization parameter. Dnocco was 
shown to be asymptotically independent of the choice 
of kernels. Thus, KS with Dnocco (KS-NOCCO) is 
expected to be less sensitive to the kernel parameter 
choice than KS-HSIC. 
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The permuted version of L can be written as 

i(n) = n .Ln(n^Zn + nej„)^ 

= n^Z(i + ne7„)"in 

= n in, 

where we used the orthogonahty of 11 (i.e., 11^ 11 = 
nn^ = /„). Thus, the dependency measure for ZiYi) 
can be written as 

DNocco(^(n)) = tr(:^n^£n). 

Since this is essentially the same form as HSIC, a local 
optimal solution may be obtained in the same way as 
KS-HSIC: 



(1 - 77)n°''^ + r/argmaxtr [U^ LU°^'^K 
n ^ 



(7) 

However, the property that Dnocco is independent 
of the kernel choice holds only asymptotically. Thus, 
with finite samples, Dnocco does depend on the 
choice of kernels as well as the regularization parame- 
ter e which needs to be manually tuned. 

4.2 Least-Squares Object Matching 

Next, we propose an alternative method called 
least-squares object matching (LSOM), in which 
we employ least-squ ares mutual information (LSMI) 
(ISuzuki et a.l\ . 120091 ) as a dependency measure. LSMI 
is a consistent estimator of the squared-loss mutual 
information (SMI) achieving the optimal convergence 
rate. SMI is defined and expressed as 



SMI(Z) 

_ 1 

~ 2 

_ 1 

~ 2 



p{X)p{Y) 



- 1 p{X)p{Y)dXdY 



p{X,Y)dXdY--. 



(8) 



\p{X)p{Y) 

Note that SMI is the Pearson divergence ( Pearsonl 



1900) from p{X, Y) to p{X)p{Y), while ordinary MI is 



the K ullback-Leibler divergence (KuUback and Leible r. 
Il95ll ) from piX,Y) to p{X)p{Y). SMI is zero if and 
only if X and Y are independent, as ordinary MI . 
Its es timator LSMI is given as follows (jSuzuki et 
120091 ): 



LSMI(Z) = ia^/i-i, 



where 



H ^ — {{KK'^)o{LL^) + \I^ 



h= I -KoL 

n 



Here, A (> 0) is the regularization parameter. Since 
cross-validation (CV) with respect to SMI is possible 
for model selection, tuning parameters in LSMI (i.e., 
the Gaussian kernel width and the regularization pa- 
rameter) can be objectively optimized. This is a no- 
table advantage over kernel-based approaches. 

Below, we use the following equivalent expression of 
LSMI: 



LSMl(Z) = ^tr{LAK)~l-, 
In 2 



(9) 



where A is the diagonal matrix with diagonal elements 
given by a. N ote that we used Eq.(73) and Eq.(75) in 
Minkal (|2000t ) for obtaining the above expression. 



LSMI for the permuted data Z{Y\.) is given by 
LSMI(Z(n)) = i-tr {U^LUAnK) - i, 

where An is the diagonal matrix with diagonal ele- 
ments given by ctn, and an is given by 



'n 



hn= -fs:o(n'in) i„ 



Consequently, LSOM is formulated as follows: 



max LSMI(Z(n)). 



Since this optimization problem is in general NP-hard 
and is not convex, we simply use the same optimization 
strategy as KS-HSIC, i.e., for the current 11°''^, the 
solution is updated as 

n"™ = (1 - 7?)n°'^ + ?7argmax tr {U^ LU"^"^ A^^m K) 



n 



(10) 



5 Experiments 



In this section, we first illustrate the behavior of the 
proposed methods using a toy data set, and then ex- 
perimentally evaluate our proposed algorithms in the 
image matching, unpaired voice conversion, and photo 
album summarization tasks. 

In all the methods, we use the Gaussian kernels: 



K{x, x') — exp 



L{y,y') = exp 



„'l|2 



Wy-y 



l\\2 



Makoto Yamada, Masashi Sugiyama 



0.5 





-0.5 0.5 

X 

(a) Unpaired data 




-0.5 0.5 

X 

(d) Matched result by LSOM. 



0.5 



>- 



-0.5 



o 
o 




o 
o 




o 
o 

% 


"^"^ 

% 

o 
o 

o 
o 
o 





-1 -0.5 0.5 1 

X 

(b) Eigenvalue-based initialization. 
0.6| ' ' ' 1 




2 

Iteration 



0.5 



>- 



-0.5 



-0.5 



0.5 



(c) Matched result by KS-NOCCO 
0.. 




2 

Iteration 



(e) Values of empirical Dnocco score (f) Values of empirical SMI score in 
in KS-NOCCO. LSOM. 



Figure 1: Illustrative example. (a)-(d):The solid line denotes the true function and the circles denote samples, 
(e): Values of empirical Dnqcco score as a function of the number of iterations, (f): Values of empirical SMI 
score as a function of the number of iterations. 



and we set the maximum number of iterations for up- 
dating permutation matrices to 20 and the step size i] 
to 1. To avoid falling into undesirable local optima, op- 
timization is carried out 10 times with different initial 
permutation matrices, which are determined by the 
eigenvalue-based initialization heuristic with Gaussian 
kernel widths 

(CTxjCTy) = c X (TOx,my), 

where c = 1^/"^, 2^/2, . . . , IQi/a^ and 

i. = 2-i/Vedian({||a;,-a;,||}^^.^i), 
2-VVedian({||y,-y,||}" ). 



In KS-HSIC and KS-NOCCO, we use the Gaussian 
kernel with the following widths: 

(crx,o-y) = c' X {m^,my), 

where c' = l^/^, 10^/^. In KS-NOCCO, we use the 
following regularization parameters: 

e = 0.01,0.05. 

In LSOM, we choose the model parameters of LSMI, 
CTx, CTy, and A by 2-fold CV from 

(crx,cry) = c X (mx,TOy), 

A = lO"^lO"^lO"3. 



5.1 Illustrative Example 

Here, we illustrate the behavior of the proposed KS- 
NOCCO and LSOM using a toy matching dataset. 

Let us consider the following regression model: 
Y = X^, 

where X is subject to the uniform distribution on 
(—1,1). We draw 100 paired samples of X and Y fol- 
lowing the above generative model (i.e, {(xi, yi)},-£'}). 
Then, given that {2/i}i£i are randomly shuffled, the 
goal is to recover the original correspondence. In KS- 
NOCCO, we set the Gaussian kernel width to 



(CTxjCy) = 10^^^ X (TOx,my), 



and e = 0.05. 



Figure 1(a) shows the original unpaired data, where 



the true function is shown by the solid line. Figure 1(b) 



shows the matched pairs with eigenvalue-based initial- 



ization, and Figures 1(c) and 1(d) show the matched 



pairs by KS-NOCCO and LSOM. The graphs show 
that matching are performed correctly by KS-NOCCO 



and LSOM. Figures 1(e) and 1(f) show the values of 
Dnocco and LSMI scores as functions of the number 
of iterations. This shows that a local optimal solution 
has been obtained only in one iteration. 
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(b) KS-NOCCO with different Gaus- 
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Figure 2: Image matching results. The best method in terms of the mean error and comparable methods 
according to the t-test at the significance level 1% are specified by 'o'. 




Figure 3: Image matching result by LSOM. In this 
case, 234 out of 320 images (73.1%) are matched cor- 
rectly. 

5.2 Image Matching 

Next, let us consider a toy image matching problem: 
we vertically divide images of size 40 x 40 pixels in the 
middle, and make two sets of half-images {xi}^^^ and 
{yi}"=i- Given that {yijf^i is randomly permuted, 
the goal is to recover the correct correspondence. 

Figure [2] summarizes the average correct matching 
rate over 100 runs as functions of the number of im- 
ages, showing that the proposed LSOM method tends 
to outperform the best tuned KS-NOCCO and KS- 
NOCCO methods. Figure [3] depicts an example of 
image matching results obtained by LSOM, showing 
that most of the images are matched correctly. 



5.3 Unpaired Voice Conversion 

Next, we consider an unpaired voice conversion task, 
which is aimed at matching the voice of a source 



speaker with that of a target speaker. 

In this experiment, we use 200 short utterance sam- 
ples recorded from two male speakers in French, with 
sampling rate 44.1kIIz. We first convert the utter- 
ance samples to 5 0-dimensional line spec tral frequen- 
cies (LSF) vector ( Kain and Maconlll988l ). We denote 
the source and target LSF vectors by x and y, respec- 
tively. Then the voice conversion task can be regarded 
as a multi-dimensional regression problem of learning 
a function from x to y. However, different from a stan- 
dard regression setup, paired training samples are not 
available; instead, only unpaired samples {xiY^^i and 
are given. 

By CDOM, we first match {£Ci}"^i and and 
then we train a multi-dimensiona l kernel regression 
model (|Sch51kopf and Smo bl. l2002l ) using the matched 
samples {(a;,,(o, yi)}"=i as 

" S 
min^ ||y, - W^k{x,^,^)f + -tT{W^W), 



i=i 



where 



kix) = {K{x, a;^(i)), . . . , K{x, a;^(„)))^, 



K{x, x') = exp ( — 



\x — X 



2t2 



Here, t is a Gaussian kernel width and (5 is a regular- 
ization parameter; they are chosen by 2-fold CV. 

We repeat the experiments 100 times by randomly 
shuffling training and test samples, and evaluate the 
voice convergence perform ance by log- spectral distance 
for 8000 test sample^ (jOuackenbush et 
Figure m shows the true spectral envelope and their es- 
timates, and Figure [5] shows the average performance 



^ The smaller the spectral distortion is, the better the 
quality of voice conversion is. 
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Figure 4: True spectral envelopes and their estimates. 
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Figure 5: Unpaired voice conversion results. The best 
method in terms of the mean error and comparable 
methods according to the t-test at the significance level 
1% are specified by 'o'. 



over 100 runs as the number of training samples. These 
results show that the proposed LSOM tends to outper- 
form KS-NOCCO and KS-HSIC. 



aligned in the way that images with similar colors are 
aligned closely. 

Similarly, we use the Frey face dataset 
( Roweis and Saul . 2000() . which consists of 225 
gray-scale face images with 28 x 20 (= 560) pixels. 
We similarly convert a image into a 560-dimensional 
vector, and we set the grid size tol5xl5(=225). The 
results depicted in Figure [6(b)] shows that similar face 
images (in terms of the angle and facial expressions) 
are assigned in nearby cells in the grid. 

Next, we apply LSOM to the USPS dataset 
(jHastie et all 120011 ). In this experiment, we use 320 



gray-scale images of digit '7' with 16 x 16 {— 256) 
pixels. We convert an image into a 256-dimensional 
vector, and we set the grid size to 16 x 20 (= 320). 
The result depicted in Figure 6(c) shows that digits 



with similar profiles are aligned closely. 

Finally, we align the Flickr, Frey face, and USPS im- 
ages into more complex frames — a Japanese charac- 
ter 'mountain', a smiley-face shape, and a '777' digit 
shape. The results depicted in Figure [7] shows that 
images with similar profiles are located in nearby grid 
coordinate cells. 



5.4 Photo Album Summarization 

Finally, we apply the proposed LSOM method to a 
photo album summarization problem, where photos 
are automatically aligned into a designed frame ex- 
pressed in the Cartesian coordinate system. 



We use 320 inr a ges w ith RGB format used in 
uadrianto et al. ( 2010t ). which were originally ex- 
tracted from f/zcfcrQ. We first convert the images from 
RGB to Lab space and resize them to 40 x 40 pixels. 
Next, we convert a 40 x 40 x 3 (= 4800) image into a 
4800-dimensional vector. We first consider a rectangu- 
lar frame of 16 x 20 {— 320), and arrange the images in 



this rectangular frame. Figure 6(a) depicts the photo 



album summarization result, showing that images are 



http://www.flickr.com 



6 Conclusion 

In this paper, we proposed two alternative methods 
of cross-domain object matching (CDOM). The first 
method uses the dependence measure based on the 
normalized cross-covariance operator, which is advan- 
tageous over HSIC in that it is asymptotically inde- 
pendent of the choice of kernels. However, with finite 
samples, it still depends on the choice of kernels which 
needs to be manually tuned. To cope with this prob- 
lem, we proposed a more practical CDOM approach 
called least-squares object matching (LSOM). LSOM 
adopts squared-loss mutual information as a depen- 
dence measure, and it is estimated by the method of 
least-squares mutual information (LSMI). A notable 
advantage of the LSOM method is that it is equipped 
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(a) Layout of 320 images into a 2D 
grid of size 16 by 20 using LSOM. 



(b) Layout of 225 facial images into a 
2D grid of size 15 by 15 using LSOM. 
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(c) Layout of 320 digit '7' into a 2D 
grid of size 16 by 20 using LSOM. 



Figure 6: Images are automatically aligned into rectangular grid frames expressed in the Cartesian coordinate 
system. 





(a) Layout of 120 images into a 
Japanese character 'mountain' by 
LSOM. 



(b) Layout of 153 facial images into 
'smiley' by LSOM. 
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(c) Layout of 199 digit '7' into '777' by 
LSOM. 



Figure 7: 
system. 



Images are automatically aligned into complex grid frames expressed in the Cartesian coordinate 



with a natural cross-validation procedure that allows 
us to objectively optimize tuning parameters such as 
the Gaussian kernel width and the regularization pa- 
rameter in a data-dependent fashion. We applied the 
proposed methods to the image matching, unpaired 
voice conversion, and the photo album summarization 
tasks, and experimentally showed that LSOM is the 
most promising. 
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